TECHnalysis Research Services: Blogs, June 1, 2023

Previous Blogs

January 5, 2023
AI To Go Mainstream in 2023

2022 Blogs

TECHnalysis Research Blog

June 1, 2023
Hybrid AI is moving generative AI tech from the cloud to our devices

By Bob O'Donnell

As exciting as generative AI tools like ChatGPT, Google’s Bard and Microsoft’s numerous Copilots may be, they all currently face one restriction: you have to be connected to the Internet to use them. Now, for most people and in most situations that isn’t a big problem, but imagine how great it would be if you could use them on your computers and smartphones even if you have a poor connection or none at all?

Not only does this increase the situations where you could take advantage of their impressive capabilities, but it can have a number of other important but not necessarily obvious benefits. First, it turns out that the computing power—and required electrical power—to run these generative AI tools is currently massive. That means companies who are offering these services are spending a lot of money to enable them and, eventually, that could translate into having those costs passed onto consumers and businesses that use them.

Second, there are some important security and privacy-related benefits of not running everything in the cloud. In many of the early versions of these generative AI tools, whatever you type into them is tracked and fed into the large language models (LLMs) powering these services. It’s part of what’s called the model training process. They also use this information to better personalize the information that these tools generate for you.

In fact, some of the more advanced generative AI tools are likely going to evolve into something that’s akin to digital personal assistants that can help plan and organize tasks and meetings for you. Unlike first generation tools like Cortana and Siri, however, these generative AI-powered tools will be able to do so with more context and knowledge about you (if you let them, of course).

As cool as this may be, there’s often a tradeoff in terms of privacy. Just as a real-world personal assistant needs to know a lot about a boss’ schedule and work, so too does a digital assistant need to know about your work and schedule to be effective as possible. As more of the work powering these AI models shifts onto the devices, however, less of this information needs to be transferred to the cloud, thereby offering a more private solution.

The way to solve both the power and privacy issues with generative AI is to leverage a concept called distributed computing, where you essentially split and distribute the computing “work” across the cloud and devices. When it comes to power, if some of the computations that only used to happen in the cloud can be done on devices, then it’s cheaper for company to run these services in the cloud. On the privacy side, if your data, schedule, etc. can remain on your device but services that know how to use that information for a customized personal assistant experience run on your device, then little to none of your information will go to the cloud.

Recently, a number of companies have been talking about this idea of distributed computing for generative AI. At the company’s recent Build developer conference, for example, Microsoft discussed what they’re calling Hybrid AI. Think of it as the next generation of generative AI tools. Microsoft’s version is referred to as the Hybrid Loop, and it leverages a software development platform called ONNX Runtime that software developers can use to take advantage of both the computing resources of devices as well as the company’s own Azure cloud computing services. In other words, it’s offering a set of tools to software developers to do distributed computing.

Phone chip maker Qualcomm, whose chips and/or modems are found in most of the smartphones sold in the US, has also been talking about the hybrid AI concept and some of its other benefits. For its part, the company has created a set of AI software services called the Qualcomm AI Stack that makes it much easier to run generative AI tools on smartphones. In fact, the company has demonstrated the ability to run the image generating tool Stable Diffusion on phones using its chips.

Speaking of semiconductor chips, as great as the concept of hybrid AI and distributed computing may sound, the only way to make it possible is to supercharge the capabilities of our devices. In order to run the foundation AI models that power generative AI apps and services directly (or even partially) on your devices, we’re going to see a whole new range of AI accelerator chips coming into PCs and smartphones over the next year or so.

On top of that, it turns out that OS companies like Microsoft and Google need to develop more support for these chips. At its Build event, Microsoft pointed out, for example, that some of its underlying work for Hybrid AI will be able to leverage the CPU, GPU, NPU (neural processing unit), and potentially other specialized AI accelerators found on modern PCs. That means having newer processors from Intel, AMD and Qualcomm, as well as GPUs from Nvidia and AMD, is going to start to be more important than ever.

In recognition of this, many of the big chip companies have made announcements in this area. AMD, for example, has announced a new version of their Ryzen line of CPUs called the Ryzen 7040 that integrates a dedicated AI accelerator. Similarly, Intel’s next generation CPU line, codenamed Meteor Lake, is rumored to be its first to include a dedicated AI accelerator. Both of these chips are expected later this year.

In addition, Qualcomm’s 8cx line of Arm-based processors for PCs also includes dedicated AI acceleration capabilities and they’re expected to have a new version later this year as well. Qualcomm has also demonstrated that some of its newer Snapdragon 8 Gen 2 processors for premium phones—found in Android smartphones from companies like Samsung and Motorola—have the ability to run generative AI models and applications directly on the phone.

To be clear, at the present moment, the vast majority of generative AI software and services still run on the cloud. The computing requirements that tools like the full ChatGPT need can only be met with huge amounts of cloud-based computer servers. Over time, however, we’re going to see both new types of smaller AI models and clever ways of shifting the computing workloads generative AI demands onto our devices. When we do, even more mind-blowing AI-powered capabilities will start to become available.

As you can tell, the world of generative AI is causing massive disruptions across the entire tech world, and its implications go far deeper than they first appear. While it can be a bit overwhelming to keep track of, the important thing to remember is that we’re embarking on one of the most exciting new eras of computing, across PCs, smartphones, and all other devices, in quite some time. Hang on and enjoy the ride.

Here’s a link to the original article: https://www.linkedin.com/pulse/hybrid-ai-moving-generative-tech-from-cloud-our-bob-o-donnell

Bob O’Donnell is the president and chief analyst of TECHnalysis Research, LLC a market research firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on LinkedIn at Bob O’Donnell or on Twitter @bobodtech.